Goto

Collaborating Authors

 model-centric approach


The Principles of Data-Centric AI

Communications of the ACM

The role of data and its quality in supporting AI systems is gaining prominence and giving rise to the concept of data-centric AI (DCAI), which breaks away from widespread model-centric approaches. The flurry of conversation around DCAI can be credited to a recent campaign by Andrew Ng, an AI pioneer, and his colleagues. However, DCAI is a culmination of concerns and efforts around improving data quality in AI projects. DCAI can be understood as an emerging term for a wealth of preceding practices and research work around data quality that complements structured frameworks such as human-centered data science.4,5 As such, the nature of'data work' itself is not necessarily new.35


MEDIAR: Harmony of Data-Centric and Model-Centric for Multi-Modality Microscopy

Lee, Gihun, Kim, Sangmook, Kim, Joonkee, Yun, Se-Young

arXiv.org Artificial Intelligence

Cell segmentation is a fundamental task for computational biology analysis. Identifying the cell instances is often the first step in various downstream biomedical studies. However, many cell segmentation algorithms, including the recently emerging deep learning-based methods, still show limited generality under the multi-modality environment. Weakly Supervised Cell Segmentation in Multi-modality High-Resolution Microscopy Images was hosted at NeurIPS 2022 to tackle this problem. We propose MEDIAR, a holistic pipeline for cell instance segmentation under multi-modality in this challenge. MEDIAR harmonizes data-centric and model-centric approaches as the learning and inference strategies, achieving a 0.9067 F1-score at the validation phase while satisfying the time budget. To facilitate subsequent research, we provide the source code and trained model as open-source: https://github.com/Lee-Gihun/MEDIAR


Data-Centric AI Vs. Model-Centric AI - Everything You Need Know

#artificialintelligence

How much more data do we need to make this model work reliably? The quest to build effective AI models is never-ending. But it is now becoming more data-centric than ever before. That brings us to an even more basic question – What are the fundamental elements of a working AI solution? In the traditional approach, machine learning practitioners primarily show interest in improving the model to make the solution more effective. This strategy is called the model-centric approach. Let's discuss this approach in detail to understand the need for the data-centric approach.


How to implement data-centric AI in NLP

#artificialintelligence

Andrew Ng, a distinguished influencer in today's AI world, brings his new movement to the AI world, namely an approach called "data-centric AI". He and his ventures often spread the word. Many times, he explains the importance of implementing the approach. In this article, we're going to discover what he means by "data-centric AI" and how to implement it in NLP task. "Data-centric AI is the discipline of systematically engineering the data used to build an AI system."


Data Centric AI Vs. Model Centric AI: How to take maximum advantage of both.

#artificialintelligence

Data-centric AI: The term "data-centric AI" refers to the use of machine learning techniques and algorithms that are optimized for specific kinds of data. This approach is particularly effective in domains where there is a shortage of representative and labelled datasets. Industries such as healthcare, manufacturing, and agriculture often have large volumes of unlabeled data and require an AI model to be trained from these sources. A data-centric approach emphasizes the technical aspects of a task instead of focusing on the algorithm itself. To demonstrate how useful, such an approach is, Andrew Ng and his team organized a competition called the Data-Centric AI Competition.


Why MLOps Needs to Be Data-Centric

#artificialintelligence

The word'MLOps' became a hot keyword these days. The trend is poised to continue as AI takes more roles in the industry and society as a whole. The article is aimed at explaining what MLOps is and why the concern for data quality should be at the center of MLOps. The word MLOps is the combination of Machine Learning (ML) and Operations (MLOps ML Ops). It refers to the set of engineering practices to develop and operate machine learning models in production.


From Model-centric to Data-centric Artificial Intelligence

#artificialintelligence

Two basic components of all AI systems are Data and Model, both go hand in hand in producing desired results. In this article we talk about how the AI community has been biased towards putting more effort in the model, and see how it is not always the best approach. We all know that machine learning is an iterative process, because machine learning is largely an empirical science. You do not jump to the final solution by thinking about the problem, because you can no easily articulate what the solution should look like. Hence you empirically move towards better solutions.


Authoritative Intelligence: How Data Labelling Increases the Accuracy of AI Models

#artificialintelligence

Whereas many still cling to the former, some, including Artificial Intelligence (AI) luminary Andrew Ng, fervently argue that data, not models, must be at the core of the advancement of AI. It turns out he's got a point; in fact, more than one. But let's start at the beginning. While all ML models essentially try to make predictions, the labels' accuracy determines whether these predictions hold true in real life. In other words, the labeled aspects of data need to be invariably consistent with the "outside world," i.e. the actual conditions for which the model was designed. For this very reason, as far as the model is concerned, data labels come before everything else.

  Country:

How Data-Centric Platforms Solve the Biggest Challenges for MLOps

#artificialintelligence

Recently, I learned that the failure rate for machine learning projects is still astonishingly high. Studies suggest that between 85-96% of projects never make it to production. These numbers are even more remarkable given the growth of machine learning (ML) and data science in the past five years. For businesses to be successful with ML initiatives, they need a comprehensive understanding of the risks and how to address them. In this post, we attempt to shed light on how to achieve this by moving away from a model-centric view of ML systems towards a data-centric view. Of course, everyone knows that data is the most important component of ML. Nearly every data scientist has heard: "garbage in, garbage out" and "80% of a data scientist's time is spent cleaning data".


From model-centric to data-centric

#artificialintelligence

In my last blog post I've covered the rise of DataPrepOps and the importance of data preparation to achieve optimized results from Machine Learning based solutions. The value of data and its impact on the quality of ML-based solutions have, for sure, been underestimated so far, but this is changing -- in Andrew's NG latest session, he covered the benefits of a bigger investment in data preparation with his team proving that investing in improved existing data quality is effective as collecting the triple amount of the data. And that is what I'll be covering today -- the role of data quality in taking AI to the next level. What is the right balance to achieve success? With the datasets publicly available, through open databases or Kaggle, for example, I understand why the more model-centric focused approach: data in its essence more or less well-behaved, which means that to improve the solutions, the focus had to be on the only element that had more freedom to be tweaked and changed, the code.